Python for Automation | File handling

In this blog we are going to see about Python for Automation in crisp and clear manner. Automation has become an essential part of our lives, helping us save time and improve efficiency. Python, with its simplicity, versatility, and extensive libraries, has emerged as a powerful tool for automating various tasks. In this comprehensive guide, we will dive into the world of Python for automation. Whether you're a beginner or an experienced developer, this blog post will provide you with the knowledge to harness the automation capabilities of Python effectively.

There are various language available, but why we choose python as a language for automation purpose, that question will arise for everyone and that is a common thing, so lets see why it is.

Python offers several advantages that make it an ideal choice for automation:

Simplicity and Readability: Python's clean syntax and intuitive structure make it easy to learn and understand, even for beginners.

Extensive Library Ecosystem: Python has a vast collection of libraries and packages designed specifically for automation, enabling developers to leverage existing tools and functionalities.

Cross-platform Compatibility: Python runs seamlessly on major operating systems, making it suitable for automating tasks across different platforms.

As a beginner you need to follow below steps to start with python for automation,

Installing Python : Download and install the latest version of Python from the official Python website, ensuring you choose the appropriate version for your operating system. Always go with official site not only for python but for all software download.

Setting Up the Development Environment: Explore popular Integrated Development Environments (IDEs) such as PyCharm, Visual Studio Code, or Jupyter Notebook, and configure your environment for efficient coding.

Among all the above, Jupyter Notebook is my favourite. Because it has cell by cell execution ability. For beginners this will be very helpful. Then i like visual studio code as a developer. This will be helpful while you deploy the code in Azure Functions and we can do git operation easily with this.

Understanding Scripting Basics: Familiarize yourself with the fundamentals of scripting, including variables, data types, control flow structures, and functions.

Majorly, if you think of a Data Engineer perspective mostly it will be of file handling. So you need to focus on file handling as well. That will definitely help you in most of the time. I Strongly recommend to learn the file handling using python.

First thing is why we need to Automate the File handling:

Manually manipulating files can quickly become tedious , especially when dealing with large datasets or complex file structures. Automation offers several benefits.

Time Savings: Automation eliminates the need for manual intervention, allowing you to focus on more critical tasks.

Consistency: Automated processes follow predefined rules consistently, reducing the risk of human errors.

Scalability: Once set up, automated scripts can handle increasing workloads without additional effort.

Complex Operations: Automation enables you to perform intricate file operations that would be cumbersome manually.

As an initial phase You can go for the learning of how to open a file and close a file. then you can go for Copying and Moving the files. Next will be renaming the files. After that everything will be processing the data . Like after we read the lines from the file we can loop over the lines and we can perform any task on the data. This is the a nutshell but it is a vast topic.

From the Data Domain Perspective , we must know Some of the things beyond the above points. That is, the next basic things that needs to be learned is to connect to the database. Because most of the time as a Data Engineer , we will be working with sql. So we can write our query , then connect to the database and execute them. By this way we have the control of the sql and over the sql we can do our automation by templating the sql query.

From the above discussion, let us assume that we are an Azure Data Engineer. So, we must know how to work with Azure Functions. Over there we can use the python code. That code can be called via the Azure Data Factory. How we can call, we will be having Azure Function Activity. Where we can pass the parameters to it and that can be received by our python function. Over our function we can have a sql query, that can be in some template format, where the values can be replace with the parameters that is received as a input. Now we will be executing that formed query in the database and fetch the result. That result can be processed further and if needed that can be used to update the tables in the database.

Lets see some of the key point in the file handling,

Open:

f = open(file_name, mode)

Now we have a doubt that what is mode, they can be ,

r : open an existing file for read

w : open an existing file for write, note this will overwrite the content if the file is present else it will create a new file.

a: open an existing file for append. And it will create new file if the file does not available.

r+ : this is for read and write . And it wont create new file if the file does not available.

w+ : this is for read and write . And it will create new file if the file does not available.

a+: this is for append and read. And it will create new file if the file does not available.

Example :

f = open(example.txt, 'r')
for line in f:
print(line)

output:
Hi

I am a Data engineer
Bye

file.read():

This will read the entire file

with open('example.txt') as f:
data = f.read()
print(data)

Output:
Hi
I am a Data Engineer
Bye

file.readlines() :

this return all the lines in the files as list.

with open('example.txt') as f:
      data = f.readlines()
      print(data)
Output:
      ['Hi', 'I am a Data Engineer', 'Bye']

Now we have read the file, so we can process the data as per the requirement and we can write it as a file.

write:

the required output is stored in the variable result.

with open('output.txt') as f:
f.write(result)

The above things that are discussed are just a beginning to file handling, As a data engineer we must get deep dive into it. Hope this blog is helpful for beginners.

Thank You !!!

Ticker

Python for Automation | File handling - Datacloudy

Post a Comment

0 Comments

Followers

Search This Blog

About Me

Labels

Popular Posts

Apache Flume | Configuration File Creation - Datacloudy

Cache vs Persist in Spark - Databricks Certification

Automate event hubs creation using Powershell script -Datacloudy

Footer Menu Widget